cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Pien
by New Contributor II
  • 12233 Views
  • 5 replies
  • 0 kudos

Resolved! Getting date out of year and week

Hi all,I'm trying to get a date out of the columns year and week. The week format is not recognized.  df_loaded = df_loaded.withColumn("week_year", F.concat(F.lit("3"),F.col('Week'), F.col('Jaar')))df_loaded = df_loaded.withColumn("date", F.to_date(F...

  • 12233 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Pien Derkx​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers yo...

  • 0 kudos
4 More Replies
QuicKick
by New Contributor
  • 9788 Views
  • 2 replies
  • 0 kudos

How do I search for all the columns/field names starting with "XYZ"

I would like to do a big search on all field/columns names that contain "XYZ".I tried below sql but it's giving me an error.SELECT table_name,column_nameFROM information_schema.columnsWHERE column_name like '%<account>%'order by table_name, column_na...

  • 9788 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Ian Fox​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your ...

  • 0 kudos
1 More Replies
kaileena
by New Contributor
  • 1768 Views
  • 2 replies
  • 0 kudos

cannot install RMySQL "there is no package called ‘RMySQL’

cannot install RMySQL on databricks. i tried:install.packages("RMySQL")i got the error:Installing package into ‘/local_disk0/.ephemeral_nfs/envs/rEnv-c677bc4c-e6a3-40df-a5ab-bfd5d277e0c0’ (as ‘lib’ is unspecified) Warning: unable to access index for ...

  • 1768 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @miru miro​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

  • 0 kudos
1 More Replies
Merchiv
by New Contributor III
  • 8316 Views
  • 4 replies
  • 0 kudos

Difference between Databricks and local pyspark split.

I have noticed some inconsistent behavior between calling the 'split' fuction on databricks and on my local installation.Running it in a databricks notebook givesspark.sql("SELECT split('abc', ''), size(split('abc',''))").show()So the string is split...

image.png
  • 8316 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Ivo Merchiers​ :The behavior you are seeing is likely due to differences in the underlying version of Apache Spark between your local installation and Databricks. split() is a function provided by Spark's SQL functions, and different versions of Spa...

  • 0 kudos
3 More Replies
arw1070
by New Contributor II
  • 3024 Views
  • 3 replies
  • 0 kudos

Databricks extension is not configuring in VScode

I am trying to install and work with the Databricks vscode extensions. I installed it a few weeks ago, and it initially worked, but I mistyped some of the configuration so I tried to restart, since then it has not worked. Whenever I install the exten...

  • 3024 Views
  • 3 replies
  • 0 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 0 kudos

@Anna Wuest​ I have Tried and not seeing any issues, which version of Vs code you are using. can you please try to update to latest Visual Studio Code version 1.77.1 and try to Install databricks plugin version and test .if you using windows--> pleas...

  • 0 kudos
2 More Replies
Nandini
by New Contributor II
  • 13875 Views
  • 10 replies
  • 7 kudos

Pyspark: You cannot use dbutils within a spark job

I am trying to parallelise the execution of file copy in Databricks. Making use of multiple executors is one way. So, this is the piece of code that I wrote in pyspark.def parallel_copy_execution(src_path: str, target_path: str): files_in_path = db...

  • 13875 Views
  • 10 replies
  • 7 kudos
Latest Reply
Etyr
Contributor
  • 7 kudos

If you have spark session, you can use Spark hidden File System:# Get FileSystem from SparkSession fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration()) # Get Path class to convert string path to FS path path = spark._...

  • 7 kudos
9 More Replies
GuMart
by New Contributor III
  • 2744 Views
  • 2 replies
  • 1 kudos

Delta Live Tables - RETRY_ON_FAILURE

Hi,Is it possible to set it up the RETRY_ON_FAILURE property for DLTs through the API?I'm not finding in the Docs (although it seems to exist in a response payload).https://docs.databricks.com/delta-live-tables/api-guide.html

  • 2744 Views
  • 2 replies
  • 1 kudos
Latest Reply
GuMart
New Contributor III
  • 1 kudos

Hi @Suteja Kanuri​ ,Thank you so much for the quick and complete answer!Regards,

  • 1 kudos
1 More Replies
alm
by New Contributor III
  • 6107 Views
  • 2 replies
  • 2 kudos

Resolved! Vectorized reading of parquet file containing decimal type column(s)

I was trying to read a parquet file, and write to a delta table, with a parquet file that contains decimal type columns. I encountered a problem that is pretty neatly described by this kb.databricks article, and which I solved by disabling the vector...

  • 6107 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Alberte Mørk​ :The behavior you observed is due to a known issue in Apache Spark when vectorized reading is used with Parquet files that contain decimal type columns. As you mentioned, the issue can be resolved by disabling vectorized reading for th...

  • 2 kudos
1 More Replies
Anonymous
by Not applicable
  • 2486 Views
  • 2 replies
  • 2 kudos

Hello Everyone, I&#39;m interested to learn about the certifications you&#39;re pursuing to enhance your skills. Sharing your goals can inspire those ...

Hello Everyone,I'm interested to learn about the certifications you're pursuing to enhance your skills. Sharing your goals can inspire those who may have started their certification journey but struggled with motivation. Personally, I recently comple...

  • 2486 Views
  • 2 replies
  • 2 kudos
Latest Reply
FJ
Contributor III
  • 2 kudos

I'm trying the Data Engineering professional exam at the end of the month. It's like a shot in the dark because no practice exams stop are available and from what I've seen online from people who already passed it, the Advanced Data Engineering with ...

  • 2 kudos
1 More Replies
Anonymous
by Not applicable
  • 9239 Views
  • 8 replies
  • 0 kudos

Not able to connect to On-Prem Oracle from Databricks cluster

Hi Everyone,I was trying to connect to Oracle Instance from Databricks cluster and it is giving below error:java.sql.SQLTimeoutException: ORA-12170: Cannot connect. TCP connect timeout of 30000ms for host xx.x.x.*** port 1521. (CONNECTION_ID=CgM7V7UB...

  • 9239 Views
  • 8 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Satya89:The error message you received indicates that the TCP connection to the Oracle database timed out. This could be caused by a number of factors such as network issues, firewall restrictions, or the database being overloaded.Here are a few ste...

  • 0 kudos
7 More Replies
rusty9876543
by New Contributor II
  • 8200 Views
  • 5 replies
  • 2 kudos

Split dataFrame into 1MB chunks and create a single json array with each row in chunk being an array element

Hi, I have a dataFrame that I've been able to convert into a struct with each row being a JSON object.I want the ability to split the data frame into 1MB chunks. Once I have the chunks, I would like to add all rows in each respective chunk into a sin...

  • 8200 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Tamoor Mirza​ :You can use the to_json method of a DataFrame to convert each chunk to a JSON string, and then append those JSON strings to a list. Here is an example code snippet that splits a DataFrame into 1MB chunks and creates a list of JSON arr...

  • 2 kudos
4 More Replies
Hansjoerg
by New Contributor
  • 2253 Views
  • 2 replies
  • 0 kudos

Resolved! Is Azure AD Conditional Access also possible for the Databricks Account Console?

I wonder whether conditional access in Azure AD for Databricks (https://learn.microsoft.com/en-us/azure/databricks/administration-guide/access-control/conditional-access?source=docs) can be configured separately for the account console (https://accou...

  • 2253 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Hansjörg Wingeier​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ans...

  • 0 kudos
1 More Replies
tototox
by New Contributor III
  • 8031 Views
  • 3 replies
  • 0 kudos

Using dbutils.fs.ls gives overlap error.

I created a schema with that route as a managed location.(abfss://~~@~~.dfs.core.windows.net/dejeong)And an external table named 'first_table' was created in the corresponding path.(abfss://~~@~~.dfs.core.windows.net/dejeong/first_table)​The results ...

  • 8031 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @jin park​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we c...

  • 0 kudos
2 More Replies
Pien
by New Contributor II
  • 6108 Views
  • 2 replies
  • 0 kudos

Resolved! Change data format in an existing DB table

I got errors of incompatible filetypes while converting to pyspark df, so I changed all columns to string types. Now I'm trying to add this df to an existing table (where not everything was a string type). And I'm getting an error of incompatible da...

error
  • 6108 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Pien Derkx​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...

  • 0 kudos
1 More Replies
kaileena
by New Contributor
  • 2714 Views
  • 2 replies
  • 0 kudos

Error in library(RMySQL): there is no package called ‘RMySQL’

i tried to install RMySQL on databricks like this:install.packages("RMySQL")i got this error:Installing package into ‘/local_disk0/.ephemeral_nfs/envs/rEnv-c677bc4c-e6a3-40df-a5ab-bfd5d277e0c0’ (as ‘lib’ is unspecified) Warning: unable to access inde...

  • 2714 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @miru miro​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels