cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

User16826994223
by Honored Contributor III
  • 4249 Views
  • 2 replies
  • 1 kudos

AssertionError: assertion failed: Unable to delete the record but I am able to select it though

Is there any reason this command works well:%sql SELECT * FROM datanase.table WHERE salary > 1000returning 2 rows, while the below:%sql delete FROM datanase.table WHERE salary > 1000ErrorError in SQL statement: AssertionError: assertion failed:...

  • 4249 Views
  • 2 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

DELETE FROM (and similarly UPDAT. aren't supported on the Parquet files - right now on Databricks, it's supported for Delta format. You can convert your parquet files into delta using CONVERT TO DELTA, and then this command will work for you.

  • 1 kudos
1 More Replies
dataslicer
by Contributor
  • 7133 Views
  • 4 replies
  • 4 kudos

Resolved! Unable to save Spark Dataframe to driver node's local file system as CSV file

Running Azure Databricks Enterprise DBR 8.3 ML running on a single node, with Python notebook. I have 2 small Spark dataframes that I am able source via credential passthrough reading from ADLSgen2 via `abfss://` method and display the full content ...

  • 7133 Views
  • 4 replies
  • 4 kudos
Latest Reply
Dan_Z
Honored Contributor
  • 4 kudos

Modern Spark operates by a design choice to separate storage and compute. So saving a csv to the river's local disk doesn't make sense for a few reasons:the worker nodes don't have access to the driver's disk. They would need to send the data over to...

  • 4 kudos
3 More Replies
PaulHernandez
by New Contributor II
  • 18116 Views
  • 7 replies
  • 0 kudos

Resolved! How to show an image in a notebook using html?

Hi everyone, I just learning how to personalize the databricks notebooks and would like to show a logo in a cell. I installed the databricks cli and was able to upload the image file to the dbfs: I try to display it like this: displayHTML("<im...

0693f000007OoKMAA0 0693f000007OoKNAA0
  • 18116 Views
  • 7 replies
  • 0 kudos
Latest Reply
_robschaper
New Contributor II
  • 0 kudos

@Paul Hernandez​ @Sean Owen​ @Navneet Tuteja​ I solved this after I also ran into the same issue where my notebook suddenly wouldn't show an image sitting on the driver in an accessible folder - no matter what I was trying in the notebook the display...

  • 0 kudos
6 More Replies
daindana
by New Contributor III
  • 8038 Views
  • 4 replies
  • 4 kudos

Resolved! Why doesn't my notebook display widgets when I use 'dbutils' while it is displayed with '%sql CREATE WIDGET'?

The widget is not shown when I use dbutils while it works perfect with sql.For example, %sql   CREATE WIDGET TEXT state DEFAULT "CA"This one shows me widget.dbutils.widgets.text("name", "Brickster", "Name") dbutils.widgets.multiselect("colors", "oran...

dbutils get info from widget dbutils widget creation
  • 8038 Views
  • 4 replies
  • 4 kudos
Latest Reply
daindana
New Contributor III
  • 4 kudos

Hello, Ryan! For some reason, this problem is solved, and now it is working perfectly! I did nothing new, but it is just working now. Thank you!:)

  • 4 kudos
3 More Replies
BorislavBlagoev
by Valued Contributor III
  • 3915 Views
  • 5 replies
  • 4 kudos

Resolved! Databricks writeStream checkpoint

I'm trying to execute this writeStream data_frame.writeStream.format("delta") \ .option("checkpointLocation", checkpoint_path) \ .trigger(processingTime="1 second") \ .option("mergeSchema", "true") \ .o...

  • 3915 Views
  • 5 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

You can remove that folder so it will be recreated automatically.Additionally every new job run should have new (or just empty) checkpoint location.You can add in your code before running streaming:dbutils.fs.rm(checkpoint_path, True)Additionally you...

  • 4 kudos
4 More Replies
halfwind22
by New Contributor III
  • 8356 Views
  • 11 replies
  • 12 kudos

Resolved! Unable to write csv files to Azure BLOB using pandas to_csv ()

I am using a Py function to read some data from a GET endpoint and write them as a CSV file to a Azure BLOB location.My GET endpoint takes 2 query parameters,param1 and param2. So initially, I have a dataframe paramDf that has two columns param1 and ...

  • 8356 Views
  • 11 replies
  • 12 kudos
Latest Reply
halfwind22
New Contributor III
  • 12 kudos

@Hubert Dudek​ I cant issue a spark command to executor node, throws up an error ,because foreach distributes the processing.

  • 12 kudos
10 More Replies
ItsMe
by New Contributor II
  • 2533 Views
  • 4 replies
  • 7 kudos

Resolved! Run Pyspark job of Python egg package using spark submit on databricks

Error: missing application resource​Getting this error while running job with spark submit.​ I have given following parameters while creating job:--conf spark.yarn.appMasterEnv.PYSAPRK_PYTHON=databricks/path/python3--py-files dbfs/path/to/.egg job_m...

  • 2533 Views
  • 4 replies
  • 7 kudos
Latest Reply
User16752246494
Contributor
  • 7 kudos

Hi,We tried a simulate the question on our end and what we did was packaged a module inside a whl file.Now to access the wheel file we created another python file test_whl_locally.py. Inside test_whl_locally.py to access the content of the wheel file...

  • 7 kudos
3 More Replies
afshinR
by New Contributor III
  • 681 Views
  • 1 replies
  • 1 kudos

Hi, could you please help me with my question? i have not get any answers.

Hi,could you please help me with my question? i have not get any answers.

  • 681 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @afshin riahi​ , Yes, Definitely I can help you with it.Please wait while I or someone from the community gets back with a response.Thank you for your patience .

  • 1 kudos
User16868770416
by Contributor
  • 3584 Views
  • 1 replies
  • 0 kudos

What is the best way to decode protobuf using pyspark?

I am using spark structured streaming to read a protobuf encoded message from the event hub. We use a lot of Delta tables, but there isn't a simple way to integrate this. We are currently using K-SQL to transform into avro on the fly and then use Dat...

  • 3584 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

hi @Will Block​ ,I think there is a related question being asked in the past. I think it was this one I found this library, I hope it helps.

  • 0 kudos
marchello
by New Contributor III
  • 4700 Views
  • 9 replies
  • 3 kudos

Resolved! error on connecting to Snowflake

Hi team, I'm getting weird error in one of my jobs when connecting to Snowflake. All my other jobs (I've got plenty) work fine. The current one also works fine when I have only one coding step (except installing needed libraries in my very first step...

  • 4700 Views
  • 9 replies
  • 3 kudos
Latest Reply
Dan_Z
Honored Contributor
  • 3 kudos

@marchello​ I suggest you contact Snowflake to move forward on this one.

  • 3 kudos
8 More Replies
William_Scardua
by Valued Contributor
  • 2559 Views
  • 5 replies
  • 4 kudos

Resolved! Small/big file problem, how do you fix it ?

How do you work to fixing the small/big file problem ? what you suggest ?

  • 2559 Views
  • 5 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

What Jose said.If you cannot use delta or do not want to:the use of coalesce and repartition/partitioning is the way to define the file size.There is no one ideal file size. It all depends on the use case, available cluster size, data flow downstrea...

  • 4 kudos
4 More Replies
Kaniz_Fatma
by Community Manager
  • 1012 Views
  • 1 replies
  • 1 kudos
  • 1012 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 1 kudos

Hi Kainz, if you want to use Databricks to read data from one database and write to another database I would imagine that you would want to use the mongodb connector. Check out our docs here.

  • 1 kudos
Kaniz_Fatma
by Community Manager
  • 1517 Views
  • 1 replies
  • 0 kudos
  • 1517 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 0 kudos

Please run the below steps in an isolated notebook  to connect to Athena1. Install boto3 %sh pip install boto3 2. check if boto3 library is installed %python import boto3 boto3.__version__3. Run the below code %python import boto3 client = boto3.cli...

  • 0 kudos
shan_chandra
by Esteemed Contributor
  • 6701 Views
  • 1 replies
  • 3 kudos

Resolved! Cannot reserve additional contiguous bytes in the vectorized reader (requested xxxxxxxxx bytes).

I got the below error when running a streaming workload from a source Delta table Caused by: java.lang.RuntimeException: Cannot reserve additional contiguous bytes in the vectorized reader (requested xxxxxxxxx bytes). As a workaround, you can reduce ...

  • 6701 Views
  • 1 replies
  • 3 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 3 kudos

This is happening because the delta/parquet source has one or more of the following:a huge number of columnshuge strings in one or more columnshuge arrays/map, possibly nested in each otherIn order to mitigate this issue, could you please reduce spar...

  • 3 kudos
jsaddam28
by New Contributor III
  • 42038 Views
  • 24 replies
  • 15 kudos

How to import local python file in notebook?

for example I have one.py and two.py in databricks and I want to use one of the module from one.py in two.py. Usually I do this in my local machine by import statement like below two.py__ from one import module1 . . . How to do this in databricks???...

  • 42038 Views
  • 24 replies
  • 15 kudos
Latest Reply
StephAlbaRivera
Valued Contributor II
  • 15 kudos

USE REPOS! Repos is able to call a function that is in a file in the same Github repo as long as Files is enabled in the admin panel.So if I have utils.py with:import pandas as pd   def clean_data(): # Load wine data data = pd.read_csv("/dbfs/da...

  • 15 kudos
23 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels