cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

arw1070
by New Contributor II
  • 1268 Views
  • 3 replies
  • 0 kudos

Databricks extension is not configuring in VScode

I am trying to install and work with the Databricks vscode extensions. I installed it a few weeks ago, and it initially worked, but I mistyped some of the configuration so I tried to restart, since then it has not worked. Whenever I install the exten...

  • 1268 Views
  • 3 replies
  • 0 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 0 kudos

@Anna Wuest​ I have Tried and not seeing any issues, which version of Vs code you are using. can you please try to update to latest Visual Studio Code version 1.77.1 and try to Install databricks plugin version and test .if you using windows--> pleas...

  • 0 kudos
2 More Replies
Nandini
by New Contributor II
  • 6915 Views
  • 10 replies
  • 7 kudos

Pyspark: You cannot use dbutils within a spark job

I am trying to parallelise the execution of file copy in Databricks. Making use of multiple executors is one way. So, this is the piece of code that I wrote in pyspark.def parallel_copy_execution(src_path: str, target_path: str): files_in_path = db...

  • 6915 Views
  • 10 replies
  • 7 kudos
Latest Reply
Etyr
Contributor
  • 7 kudos

If you have spark session, you can use Spark hidden File System:# Get FileSystem from SparkSession fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration()) # Get Path class to convert string path to FS path path = spark._...

  • 7 kudos
9 More Replies
GuMart
by New Contributor III
  • 921 Views
  • 2 replies
  • 1 kudos

Delta Live Tables - RETRY_ON_FAILURE

Hi,Is it possible to set it up the RETRY_ON_FAILURE property for DLTs through the API?I'm not finding in the Docs (although it seems to exist in a response payload).https://docs.databricks.com/delta-live-tables/api-guide.html

  • 921 Views
  • 2 replies
  • 1 kudos
Latest Reply
GuMart
New Contributor III
  • 1 kudos

Hi @Suteja Kanuri​ ,Thank you so much for the quick and complete answer!Regards,

  • 1 kudos
1 More Replies
alm
by New Contributor III
  • 2127 Views
  • 2 replies
  • 2 kudos

Resolved! Vectorized reading of parquet file containing decimal type column(s)

I was trying to read a parquet file, and write to a delta table, with a parquet file that contains decimal type columns. I encountered a problem that is pretty neatly described by this kb.databricks article, and which I solved by disabling the vector...

  • 2127 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Alberte Mørk​ :The behavior you observed is due to a known issue in Apache Spark when vectorized reading is used with Parquet files that contain decimal type columns. As you mentioned, the issue can be resolved by disabling vectorized reading for th...

  • 2 kudos
1 More Replies
Anonymous
by Not applicable
  • 676 Views
  • 2 replies
  • 2 kudos

Hello Everyone, I'm interested to learn about the certifications you're pursuing to enhance your skills. Sharing your goals can inspire those ...

Hello Everyone,I'm interested to learn about the certifications you're pursuing to enhance your skills. Sharing your goals can inspire those who may have started their certification journey but struggled with motivation. Personally, I recently comple...

  • 676 Views
  • 2 replies
  • 2 kudos
Latest Reply
FJ
Contributor III
  • 2 kudos

I'm trying the Data Engineering professional exam at the end of the month. It's like a shot in the dark because no practice exams stop are available and from what I've seen online from people who already passed it, the Advanced Data Engineering with ...

  • 2 kudos
1 More Replies
Anonymous
by Not applicable
  • 3577 Views
  • 8 replies
  • 0 kudos

Not able to connect to On-Prem Oracle from Databricks cluster

Hi Everyone,I was trying to connect to Oracle Instance from Databricks cluster and it is giving below error:java.sql.SQLTimeoutException: ORA-12170: Cannot connect. TCP connect timeout of 30000ms for host xx.x.x.*** port 1521. (CONNECTION_ID=CgM7V7UB...

  • 3577 Views
  • 8 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Satya89:The error message you received indicates that the TCP connection to the Oracle database timed out. This could be caused by a number of factors such as network issues, firewall restrictions, or the database being overloaded.Here are a few ste...

  • 0 kudos
7 More Replies
rusty9876543
by New Contributor II
  • 3581 Views
  • 5 replies
  • 2 kudos

Split dataFrame into 1MB chunks and create a single json array with each row in chunk being an array element

Hi, I have a dataFrame that I've been able to convert into a struct with each row being a JSON object.I want the ability to split the data frame into 1MB chunks. Once I have the chunks, I would like to add all rows in each respective chunk into a sin...

  • 3581 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Tamoor Mirza​ :You can use the to_json method of a DataFrame to convert each chunk to a JSON string, and then append those JSON strings to a list. Here is an example code snippet that splits a DataFrame into 1MB chunks and creates a list of JSON arr...

  • 2 kudos
4 More Replies
Hansjoerg
by New Contributor
  • 919 Views
  • 2 replies
  • 0 kudos

Resolved! Is Azure AD Conditional Access also possible for the Databricks Account Console?

I wonder whether conditional access in Azure AD for Databricks (https://learn.microsoft.com/en-us/azure/databricks/administration-guide/access-control/conditional-access?source=docs) can be configured separately for the account console (https://accou...

  • 919 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Hansjörg Wingeier​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ans...

  • 0 kudos
1 More Replies
tototox
by New Contributor III
  • 1421 Views
  • 3 replies
  • 0 kudos

Using dbutils.fs.ls gives overlap error.

I created a schema with that route as a managed location.(abfss://~~@~~.dfs.core.windows.net/dejeong)And an external table named 'first_table' was created in the corresponding path.(abfss://~~@~~.dfs.core.windows.net/dejeong/first_table)​The results ...

  • 1421 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @jin park​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we c...

  • 0 kudos
2 More Replies
Pien
by New Contributor II
  • 2458 Views
  • 2 replies
  • 0 kudos

Resolved! Change data format in an existing DB table

I got errors of incompatible filetypes while converting to pyspark df, so I changed all columns to string types. Now I'm trying to add this df to an existing table (where not everything was a string type). And I'm getting an error of incompatible da...

error
  • 2458 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Pien Derkx​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...

  • 0 kudos
1 More Replies
kaileena
by New Contributor
  • 1369 Views
  • 2 replies
  • 0 kudos

Error in library(RMySQL): there is no package called ‘RMySQL’

i tried to install RMySQL on databricks like this:install.packages("RMySQL")i got this error:Installing package into ‘/local_disk0/.ephemeral_nfs/envs/rEnv-c677bc4c-e6a3-40df-a5ab-bfd5d277e0c0’ (as ‘lib’ is unspecified) Warning: unable to access inde...

  • 1369 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @miru miro​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

  • 0 kudos
1 More Replies
hvsk
by New Contributor
  • 8883 Views
  • 2 replies
  • 0 kudos

Using a Virtual environment

Hi All,We are working on training NHits/TFT (a Pytorch-forecasting implementation) for timeseries forecasting. However, we are having some issues with package dependency conflicts.Is there a way to consistently use a virtual environment across cells ...

  • 8883 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Harsh Kalra​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

  • 0 kudos
1 More Replies
JLCDA
by New Contributor
  • 1438 Views
  • 2 replies
  • 0 kudos

databricks-connect 9.1 : StreamCorruptedException: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe

Hello, I'm using databricks-connect 9.1 and I started having issues since last week in all functions that have a "collect()". Everything was working before : myList = df1.select("id").rdd.flatMap(lambda x: x).collect()here the error : py4j.protocol.P...

  • 1438 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Julien Larcher​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

  • 0 kudos
1 More Replies
hfrid
by New Contributor
  • 2827 Views
  • 1 replies
  • 0 kudos

JDBC connector seems to be a bottleneck when trying to insert dataframe to Azure SQL Server

Hi! I am inserting a pyspark dataframe to Azure sql server and it takes a very long time. The database is a s4 but my dataframe that is 17 million rows and 30 columns takes up to 50 minutes to insert.Is there a way to significantly speed this up? I a...

  • 2827 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Hjalmar Friden​ :There are several ways to improve the performance of inserting data into Azure SQL Server using JDBC connector:Increase the batch size: By default, the JDBC connector sends data in batches of 1000 rows at a time. You can increase th...

  • 0 kudos
Anonymous
by Not applicable
  • 1342 Views
  • 1 replies
  • 1 kudos

Testing framework using Databricks Notebook and Pytest.

Hi Friends,I am designing a Testing framework using Databricks and pytest. Currently stuck with report generation, that is generating blank with only default parameters only .for ex :-testsuites><testsuite name="pytest" errors="0" failures="0" skippe...

  • 1342 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Vijaya Palreddy​ :There are several testing frameworks available for data testing that you can consider using with Databricks and Pytest:Great Expectations: Great Expectations is an open-source framework that provides a simple way to create and main...

  • 1 kudos
Labels
Top Kudoed Authors