cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

hello_world
by Databricks Partner
  • 6099 Views
  • 3 replies
  • 2 kudos

What exact difference does Auto Loader make?

New to Databricks and here is one thing that confuses me.Since Spark Streaming is already capable of incremental loading by checkpointing. What difference does it make by enabling Auto Loader?

  • 6099 Views
  • 3 replies
  • 2 kudos
Latest Reply
Meghala
Valued Contributor II
  • 2 kudos

Auto Loader provides a Structured Streaming source called cloudFiles. Given an input directory path on the cloud file storage, the cloudFiles source automatically processes new files as they arrive, with the option of also processing existing files i...

  • 2 kudos
2 More Replies
KuldeepChitraka
by New Contributor III
  • 12375 Views
  • 4 replies
  • 6 kudos

Error handling/exception handling in NOtebook

What is a common practice to to write notebook which includes error handling/exception handling.Is there any example which depicts how notebook should be written to include error handling etc.

  • 12375 Views
  • 4 replies
  • 6 kudos
Latest Reply
Meghala
Valued Contributor II
  • 6 kudos

runtime looks for handlers (try-catch) that are registered to handle such exceptions

  • 6 kudos
3 More Replies
Aviral-Bhardwaj
by Esteemed Contributor III
  • 16340 Views
  • 3 replies
  • 25 kudos

Understanding Joins in PySpark/Databricks In PySpark, a `join` operation combines rows from two or more datasets based on a common key. It allows you ...

Understanding Joins in PySpark/DatabricksIn PySpark, a `join` operation combines rows from two or more datasets based on a common key. It allows you to merge data from different sources into a single dataset and potentially perform transformations on...

  • 16340 Views
  • 3 replies
  • 25 kudos
Latest Reply
Meghala
Valued Contributor II
  • 25 kudos

very informative

  • 25 kudos
2 More Replies
SaraGHn
by New Contributor III
  • 2216 Views
  • 1 replies
  • 4 kudos

Error for sparkdl.xgboost import XgboostRegressor

I get the error :cannot import name 'resnet50' from 'keras.applications' (/local_disk0/.ephemeral_nfs/envs/pythonEnv-a3e7b0cc-064d-4585-abfd-6473ed1c1a5b/lib/python3.8/site-packages/keras/applications/__init__.py) It looks like the Keras.applications...

image
  • 2216 Views
  • 1 replies
  • 4 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 4 kudos

try to install these libraries via init script some time this happen due to spark version in databricks , libraries can make conflict with Runtime version

  • 4 kudos
georgian2133
by New Contributor
  • 2955 Views
  • 0 replies
  • 0 kudos

Getting error [DATATYPE_MISMATCH.BINARY_OP_DIFF_TYPES]

[DATATYPE_MISMATCH.BINARY_OP_DIFF_TYPES] Cannot resolve "(DocDate AND orderedhl)" due to data type mismatch: the left and right operands of the binary operator have incompatible types ("STRING" and "DECIMAL(38,6)").; line 67, pos 066. group by 67. or...

  • 2955 Views
  • 0 replies
  • 0 kudos
joakon
by New Contributor III
  • 4776 Views
  • 5 replies
  • 1 kudos

Resolved! slow running query

Hi All, I would you to get some ideas on how to improve performance on a data frame with around 10M rows. adls- gen2df1 =source1 , format , parquet ( 10 m)df2 =source2 , format , parquet ( 10 m)df = join df1 and df2 type =inner join df.count() is ...

  • 4776 Views
  • 5 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

hey @raghu maremanda​ did you get any answer if yes ,please update here, by that other people can also get the solution

  • 1 kudos
4 More Replies
test_user
by New Contributor II
  • 44704 Views
  • 3 replies
  • 1 kudos

How to explode an array column and repack the distinct values into one array in DB SQL?

Hi, I am new to DB SQL. I have a table where the array column (cities) contains multiple arrays and some have multiple duplicate values. I need to unpack the array values into rows so I can list the distinct values. The following query works for this...

  • 44704 Views
  • 3 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

try to use SQL windows functions here

  • 1 kudos
2 More Replies
Aviral-Bhardwaj
by Esteemed Contributor III
  • 11205 Views
  • 6 replies
  • 33 kudos

Resolved! Timezone understanding

Today I was working in Timezone kind of data but my Singapore user want to see their time in the Data and USA user want to see their time in the datainstead of both, we all are getting UTC time,how to solve this issuePlease guide Data can be anything...

  • 11205 Views
  • 6 replies
  • 33 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 33 kudos

I got it guys it was happening due to a library conflict now your answers are really helpful I tried all things

  • 33 kudos
5 More Replies
Ruby8376
by Valued Contributor
  • 4204 Views
  • 5 replies
  • 1 kudos

Resolved! Databricks authentication

Hi there!!we are planning to use databricks -tableau on prem integration for reporting. Data would reside in delta lake and using ta leau-databricks connector, user would be able to generate reports from that data .question is: a private end point wi...

  • 4204 Views
  • 5 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

and make sure that you are going with SPARK SQL connection , else it will always fail

  • 1 kudos
4 More Replies
Sharmila04
by New Contributor
  • 5273 Views
  • 3 replies
  • 0 kudos

DBFS File Browser Error RESOURCE_DOES_NOT_EXIST:

Hi,I am new to databricks, and was trying to follow some tutorial to upload a file and move it under some different folder. I used DBFS option.While trying to move/rename the file I am getting below error, can you please help to understand why I am g...

image
  • 5273 Views
  • 3 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

use these three commands and it will workdbutils.fs.ls('dbfs:/FileStore/vehicle_data.csv')dbutils.fs.ls('/dbfs/FileStore/vehicle_data.csv')dbutils.fs.ls('/dbfs/dbfs/FileStore/vehicle_data.csv')ThanksAviral

  • 0 kudos
2 More Replies
SRK
by Databricks Partner
  • 10765 Views
  • 2 replies
  • 0 kudos

How to get the count of dataframe rows when reading through spark.readstream using batch jobs?

I am trying to read messages from kafka topic using spark.readstream, I am using the following code to read it.My CODE:df = spark.readStream .format("kafka") .option("kafka.bootstrap.servers", "192.1xx.1.1xx:9xx") .option("subscr...

  • 10765 Views
  • 2 replies
  • 0 kudos
Latest Reply
daniel_sahal
Databricks MVP
  • 0 kudos

You can try this approach:https://stackoverflow.com/questions/57568038/how-to-see-the-dataframe-in-the-console-equivalent-of-show-for-structured-st/62161733#62161733ReadStream is running a thread in background so there's no easy way like df.show().

  • 0 kudos
1 More Replies
KVNARK
by Honored Contributor II
  • 10621 Views
  • 7 replies
  • 7 kudos

Resolved! Copying delta to Azure SQL DB.

How to copy DELTA to AZURE SQL DB using ADF? Earlier we are using parquet format. Now, We have converted parquet to Delta by using below command: CONVERT TO DELTA parquet.path (Azure Blob Path)

  • 10621 Views
  • 7 replies
  • 7 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 7 kudos

Hi @Aviral Bhardwaj​ in ADF there is an option of a delta lake file you can directly save your file in delta lake format.

  • 7 kudos
6 More Replies
Mado
by Valued Contributor II
  • 2848 Views
  • 2 replies
  • 2 kudos

How can I pull all branches at once in Databricks?

Hi,I have cloned a remote repository into my folder in Repos. The repository has several feature branches.When I want to pull any branch, I select the desired branch in Repos and click on "Pull" button. And then I need to select another branch. Is th...

  • 2848 Views
  • 2 replies
  • 2 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 2 kudos

Hi @Mohammad Saber​ below QnA link might help you-linklink2

  • 2 kudos
1 More Replies
Labels