cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Sunny
by New Contributor III
  • 15021 Views
  • 1 replies
  • 1 kudos

Resolved! Maximum duration of the Databricks job before it times out

May I know the duration (max) a job is allowed to run if Timeout is not sethttps://docs.databricks.com/data-engineering/jobs/jobs.html

  • 15021 Views
  • 1 replies
  • 1 kudos
Latest Reply
Sivaprasad1
Databricks Employee
  • 1 kudos

This is part of the configuration of the task itself, so if no timeout is specified, it can theoretically run forever (e.g. streaming use case). Please refer timeout section in below link.https://docs.databricks.com/dev-tools/api/latest/jobs.html#ope...

  • 1 kudos
mihai
by New Contributor III
  • 10198 Views
  • 7 replies
  • 31 kudos

Resolved! Workspace deployment on AWS - CloudFormation Issue

Hello,I have been trying to deploy a workspace on AWS using the quickstart feature, and I have been running into a problem where the stack fails when trying to create a resource.The following resource(s) failed to create: [CopyZips].From the CloudWat...

  • 10198 Views
  • 7 replies
  • 31 kudos
Latest Reply
GarethGraphy
New Contributor III
  • 31 kudos

Dropping by with my experience in case anyone lands here via Google.Note that the databricks-prod-public-cfts bucket is located in us-west-2.If your AWS organisation has an SCP which whitelists specific regions (such as this example) and us-west-2 is...

  • 31 kudos
6 More Replies
Shay
by New Contributor III
  • 9374 Views
  • 8 replies
  • 6 kudos

Resolved! How do you Upload TXT and CSV files into Shared Workspace in Databricks?

I try to upload the needed files under the right directory of the project to work.The files are zipped first as that is an accepted format. I have a Python project which requires the TXT and CSV format files as they are called and used via .py files ...

  • 9374 Views
  • 8 replies
  • 6 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 6 kudos

@Shay Alam​, can you share the code with which you read the files? Apparently python interprets the file format as a language, so it seems like some options are not filled in correctly.

  • 6 kudos
7 More Replies
PJ
by New Contributor III
  • 4873 Views
  • 8 replies
  • 1 kudos

Please bring back notebook names in google chrome tabs. This feature seemed to have disappeared within the last 24 hours. Now, each tab just reads &...

Please bring back notebook names in google chrome tabs. This feature seemed to have disappeared within the last 24 hours. Now, each tab just reads "Databricks" at the top. I often have multiple databricks scripts open at the same time and it is re...

  • 4873 Views
  • 8 replies
  • 1 kudos
Latest Reply
Prabakar
Databricks Employee
  • 1 kudos

The fix has been pushed to all regions during their release maintenance window. So if your workspace is deployed with the new release, then you should be able to see the notebook names in the browser tab.

  • 1 kudos
7 More Replies
zesdatascience
by New Contributor III
  • 5242 Views
  • 2 replies
  • 2 kudos

Resolved! Delta Live Tables with CDC and Database Views with Lower Case Names

Hi,I am testing out creating some Delta Live Tables using Change Data Capture and having an issue where the resulting views that are created have lower case column names. Here is my function I am using to ingest data:def raw_to_ods_merge(table_name,s...

  • 5242 Views
  • 2 replies
  • 2 kudos
Latest Reply
zesdatascience
New Contributor III
  • 2 kudos

Hi @Kaniz Fatma​ Not found a solution just yet, but not a priority as most users will be accessing through Databricks SQL, so no further assistance required right now.Thanks

  • 2 kudos
1 More Replies
sdaza
by New Contributor III
  • 28330 Views
  • 12 replies
  • 5 kudos

Displaying Pandas Dataframe

I had this issue when displaying pandas data frames. Any ideas on how to display a pandas dataframe? display(mydataframe) Exception: Cannot call display(<class 'pandas.core.frame.DataFrame'>)

  • 28330 Views
  • 12 replies
  • 5 kudos
Latest Reply
Tim_Green
New Contributor II
  • 5 kudos

A simple way to get a nicely formatted table from a pandas dataframe:displayHTML(df.to_html())to_html has some parameters you can control the output with. If you want something less basic, try out this code that I wrote that adds scrolling and some ...

  • 5 kudos
11 More Replies
steelman
by New Contributor III
  • 19277 Views
  • 6 replies
  • 8 kudos

Resolved! how to flatten non standard Json files in a dataframe

hello, I have a non standard Json file with a nested file structure that I have issues with. Here is an example of the json file. jsonfile= """[ { "success":true, "numRows":2, "data":{ "58251":{ "invoiceno":"58...

desired format in the dataframe after processing the json file
  • 19277 Views
  • 6 replies
  • 8 kudos
Latest Reply
Deepak_Bhutada
Databricks Employee
  • 8 kudos

@stale stokkereit​ You can use the below function to flatten the struct fieldimport pyspark.sql.functions as F   def flatten_df(nested_df): flat_cols = [c[0] for c in nested_df.dtypes if c[1][:6] != 'struct'] nested_cols = [c[0] for c in nest...

  • 8 kudos
5 More Replies
Adalberto
by New Contributor II
  • 5905 Views
  • 4 replies
  • 2 kudos

Resolved! cannot resolve '(CAST(10000 AS BIGINT) div Khe)' due to data type mismatch:

Hi,I'm trying to create a delta table using SQL but I'm getting this errorError in SQL statement: AnalysisException: cannot resolve '(CAST(10000 AS BIGINT) div Khe)' due to data type mismatch: differing types in '(CAST(10000 AS BIGINT) div Khe)' (big...

  • 5905 Views
  • 4 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 2 kudos

Hi @Adalberto Garcia Espinosa​ Do you need khe column to be double? If not, below query is working:%sql CREATE OR REPLACE TABLE Productos(Khe bigint NOT NULL,Fctor_HL_Estiba bigint GENERATED ALWAYS AS (cast(10000 as bigint) div Khe)) seems to be work...

  • 2 kudos
3 More Replies
Ambi
by New Contributor III
  • 6318 Views
  • 5 replies
  • 8 kudos

Resolved! Access azure storage account from databricks notebook using pyspark or SQL

I have a storage account - Azure BLOB StorageThere I had container. Inside the container we had a CSV file. Couldn't read the file using the access Key and Storage account name.Any idea how to read file using PySpark/SQL? Thanks in advance

  • 6318 Views
  • 5 replies
  • 8 kudos
Latest Reply
Atanu
Databricks Employee
  • 8 kudos

@Ambiga D​ you need to mount the storage https://docs.databricks.com/data/data-sources/azure/azure-storage.html#mount-azure-blob-storage-containers-to-dbfs you can follow this,thanks.

  • 8 kudos
4 More Replies
Jeff1
by Contributor II
  • 25386 Views
  • 5 replies
  • 9 kudos

Resolved! How to write *.csv file from DataBricks FileStore

Struggling with how to export a Spark dataframe as a *.csv file to a local computer. I'm successfully using the spark_write_csv funciton (sparklyr R library R) to write the csv file out to my databricks dbfs:FileStore location. Becase (I'm assuming)...

  • 25386 Views
  • 5 replies
  • 9 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 9 kudos

sparklyr has a different syntax. There is function sdf_coalesce.The code which you paste is for Scala/Python. Additionally, even in python you can only specify folder not file so CSV("dbfs:FileStore/temp/")

  • 9 kudos
4 More Replies
Anonymous
by Not applicable
  • 881 Views
  • 0 replies
  • 2 kudos

www.vandevelde.eu

June Featured Member of the Month ! Werner Stinckens Job Title: Data Engineer @ Van de Velde (www.vandevelde.eu)What are three words your coworkers would use to describe you?Helpful, accurate, inquisitiveWhat is your favorite thing about your curren...

  • 881 Views
  • 0 replies
  • 2 kudos
enri_casca
by New Contributor III
  • 12561 Views
  • 13 replies
  • 2 kudos

Resolved! Couldn't convert string to float when fit model

Hi, I am very new in databricks and I am trying to run quick experiments to understand the best practice for me, my colleagues and the company.I pull the data from snowflakedf = spark.read \  .format("snowflake") \  .options(**options) \  .option('qu...

  • 12561 Views
  • 13 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

can you check this SO topic?

  • 2 kudos
12 More Replies
SailajaB
by Valued Contributor III
  • 13121 Views
  • 5 replies
  • 12 kudos

Resolved! how to convert each row of df to array of rows(list of rows)

Hi,How to convert each row of dataframe to array of rows?Here is our scenario , we need to pass each row of dataframe to one function as dict to apply the key level transformations. But as our data is very huge we can't use collect df.toJson().colle...

  • 13121 Views
  • 5 replies
  • 12 kudos
Latest Reply
SailajaB
Valued Contributor III
  • 12 kudos

@Hubert Dudek​ , Thank you for the reply. We are new to ADB. And using the below code, looking for an optimized way to do itdfJSONString = df.toJSON().collect()stringList = []  for row in dfJSONString:    # ==== Unflatten the JSON string ==== #    js...

  • 12 kudos
4 More Replies
Alix
by New Contributor III
  • 13555 Views
  • 8 replies
  • 3 kudos

Resolved! Remote RPC client disassociated error

Hello,I've been trying to submit a job to a transient cluster, but it is failing with this error :Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in ...

  • 13555 Views
  • 8 replies
  • 3 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 3 kudos

@Alix Métivier​  - The error is thrown from the user code (please investigate the jar file attached to the cluster). at m80.dbruniv_0_1.dbruniv.tFixedFlowInput_1Process(dbruniv.java:941)at m80.dbruniv_0_1.dbruniv.run(dbruniv.java:1654)at m80.dbruniv_...

  • 3 kudos
7 More Replies
cuteabhi32
by New Contributor III
  • 52648 Views
  • 11 replies
  • 1 kudos

Resolved! Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF

from pyspark import SparkContextfrom pyspark import SparkConffrom pyspark.sql.types import *from pyspark.sql.functions import *from pyspark.sql import *from pyspark.sql.types import StringTypefrom pyspark.sql.functions import udfdf1 = spark.read.form...

  • 52648 Views
  • 11 replies
  • 1 kudos
Latest Reply
cuteabhi32
New Contributor III
  • 1 kudos

Thanks i modified my code as per your suggestion and it worked perfectly Thanks again for all your inputsdflist= spark.createDataFrame(list(a.columns), "string").toDF("Name")dfg=dflist.filter(col('name').isin('ref_date')).count()if dfg==1 :  a = a.wi...

  • 1 kudos
10 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels