cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

brickster_2018
by Databricks Employee
  • 4863 Views
  • 2 replies
  • 3 kudos

Resolved! Can I install notebook scoped JAR/Maven libraries?

The notebook scoped libraries are very handy. Is it possible to leverage the same for maven jars or application jars as well?

  • 4863 Views
  • 2 replies
  • 3 kudos
Latest Reply
Pratik_Ghosh
New Contributor II
  • 3 kudos

Any further update on this topic?

  • 3 kudos
1 More Replies
Ruby8376
by Valued Contributor
  • 996 Views
  • 0 replies
  • 0 kudos

Schema definition help in scala notebook in databricks !!!!!!!1

I am building schema for an incoming avro file(json message) and creating a final dataframe for it. The schema built looks fine as per the json sample message provided but I am getting null values in all the fields. Can somebody look at this code and...

  • 996 Views
  • 0 replies
  • 0 kudos
erigaud
by Honored Contributor
  • 25750 Views
  • 5 replies
  • 8 kudos

Resolved! Gracefully stop a job based on condition

Hello, I have a job with many tasks running on a schedule, and the first tasks checks a condition. Based on the condition, I would either want to continue the job as normal, or to stop right away and don't run all the other tasks. Is there a way to d...

  • 25750 Views
  • 5 replies
  • 8 kudos
Latest Reply
erigaud
Honored Contributor
  • 8 kudos

I think the best way to accomplish this would be to either propagate the check, as mentionned by @menotron, or have the initial task in another job, and only run the second job if the condition is met. Obviously it depends on the use case. Thank you ...

  • 8 kudos
4 More Replies
Ria
by New Contributor
  • 3774 Views
  • 4 replies
  • 0 kudos

How to build master workflow for all the jobs present in workflow using databricks?

Suppose there are multiple job have been created using databricks workflow, now the requirement is to make one master workflow to trigger all the workflow depending on different condition like: some are supposed to trigger on daily basis, some on mon...

  • 3774 Views
  • 4 replies
  • 0 kudos
Latest Reply
pvignesh92
Honored Contributor
  • 0 kudos

@Ria Hi , This feature was in development when I attended last Quarter Roadmap and I thought it is available in the latest versions or could be even in Private Preview. You can check with your Databricks Solution Architect. Even if not now, could be ...

  • 0 kudos
3 More Replies
s041507
by New Contributor
  • 1001 Views
  • 0 replies
  • 0 kudos

Autoloader cannot load files from Repos with runtime 13.0+

Since runtime 13.0+ it is not possible anymore to reference Repos files with Autoloader using the "file:" prefix, e.g. "file:/Workspace/Repos/...". This was working before, but now Autoloader throws an error:com.databricks.sql.cloudfiles.errors.Cloud...

Data Engineering
autoloader
Repos
  • 1001 Views
  • 0 replies
  • 0 kudos
Magnus
by Contributor
  • 3718 Views
  • 3 replies
  • 1 kudos

Auto Loader fails when reading json element containing space

I'm using Auto Loader as part of a Delta Live Tables pipeline to ingest json files, and today it failed with this error message:om.databricks.sql.transaction.tahoe.DeltaAnalysisException: Found invalid character(s) among ' ,;{}()\n\t=' in the column ...

Data Engineering
Auto Loader
Delta Live Tables
  • 3718 Views
  • 3 replies
  • 1 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 1 kudos

@Magnus You can read the input file using Pandas or Koalas (https://koalas.readthedocs.io/en/latest/index.html)then rename the columnsthen convert the Pandas/Koalas dataframe to Spark dataframe. You can write it back with the correct column name, so ...

  • 1 kudos
2 More Replies
207474
by New Contributor
  • 2867 Views
  • 3 replies
  • 2 kudos

How do I get the total number of queries run per day on a databricks SQL warehouse/endpoint?

I am trying to access the API: GET https://<databricks-instance>.cloud.databricks.com/api/2.0/sql/history/queries

  • 2867 Views
  • 3 replies
  • 2 kudos
Latest Reply
Vidula
Honored Contributor
  • 2 kudos

Hi there @Sravan Burla​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you...

  • 2 kudos
2 More Replies
THIAM_HUATTAN
by Valued Contributor
  • 1789 Views
  • 0 replies
  • 0 kudos

Delta Live Tables Example Questions

I am testing with some examples of Delta Live Tables from https://github.com/databricks/delta-live-tables-notebooks/tree/main/divvy-bike-demoI have ran all the relevant files of ingestion:python-weatherinfo-api-ingest.pypython-divvybike-api-ingest-st...

THIAM_HUATTAN_0-1689301274037.png
  • 1789 Views
  • 0 replies
  • 0 kudos
Nick_Hughes
by New Contributor III
  • 13005 Views
  • 3 replies
  • 1 kudos

Best way to generate fake data using underlying schema

HiWe are trying to generate fake data to run our tests. For example, we have a pipeline that creates a gold layer fact table form 6 underlying source tables in our silver layer. We want to generate the data in a way that recognises the relationships ...

  • 13005 Views
  • 3 replies
  • 1 kudos
Latest Reply
RonanStokes_DB
Databricks Employee
  • 1 kudos

Hi @Nick_Hughes This may be late for your scenario - but hopefully others facing similar issues will find it useful.You can specify how data is generated in `dbldatagen` using rules in the data generation spec. If rules are specified for data generat...

  • 1 kudos
2 More Replies
Raghav2
by New Contributor
  • 9444 Views
  • 1 replies
  • 0 kudos

AnalysisException: [COLUMN_ALREADY_EXISTS] The column `<col>` already exists. Consider to choose an

Hey Guys,          I'm facing this exception while trying to read public s3 bucket "Analysis Exception: [COLUMN_ALREADY_EXISTS] The column `<column name>` already exists. Consider to choose another name or rename the existing column.",also thing is I...

  • 9444 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

You can use dbutils to read the file.%fshead <s3 path>

  • 0 kudos
kll
by New Contributor III
  • 17838 Views
  • 4 replies
  • 0 kudos

PythonException: TypeError: float() argument must be a string or a number, not 'NoneType'

I get an PythonException: float() argument must be a string or a number, not 'NoneType' when attempting to save a DataFrame as a Delta Table. Here's the line of code that I am running:```df.write.format("delta").saveAsTable("schema1.df_table", mode="...

  • 17838 Views
  • 4 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

Even though the code throws the issue while write, the issue can be in the code before as spark is lazily evaluated. The error "TypeError: float() argument must be a string or a number, not 'NoneType'" generally comes when we pass a variable to float...

  • 0 kudos
3 More Replies
erigaud
by Honored Contributor
  • 10073 Views
  • 4 replies
  • 6 kudos

Resolved! Save to parquet with fixed size

I have a large dataframe (>1TB) I have to save in parquet format (not delta for this use case). When I save the dataframe using .format("parquet") it results in several parquet files. I want these files to be a specific size (ie not larger than 500Mb...

  • 10073 Views
  • 4 replies
  • 6 kudos
Latest Reply
Lakshay
Databricks Employee
  • 6 kudos

In addition to the solutions provided above, we can also control the behavior by specifying maximum records per file if we have a rough estimate of how many records should be written to a file to reach 500 MB size.df.write.option("maxRecordsPerFile",...

  • 6 kudos
3 More Replies
kll
by New Contributor III
  • 9383 Views
  • 5 replies
  • 0 kudos

AnalysisException : when attempting to save a spark DataFrame as delta table

I get an, `AnalysisException Failed to merge incompatible data types LongType and StringTypewhen attempting to run the below command, `df.write.format("delta").saveAsTable("schema.k_adhoc.df", mode="overwrite")` I am casting the column before saving:...

  • 9383 Views
  • 5 replies
  • 0 kudos
Latest Reply
Lakshay
Databricks Employee
  • 0 kudos

The issue seems to be because the job is trying to merge columns with different schema. Could you please make sure that the schema matches for the columns.

  • 0 kudos
4 More Replies
alexisjohnson
by New Contributor III
  • 16264 Views
  • 5 replies
  • 7 kudos

Resolved! Window function using last/last_value with PARTITION BY/ORDER BY has unexpected results

Hi, I'm wondering if this is the expected behavior when using last or last_value in a window function? I've written a query like this:select col1, col2, last_value(col2) over (partition by col1 order by col2) as column2_last from values ...

Screen Shot 2021-11-18 at 12.48.25 PM Screen Shot 2021-11-18 at 12.48.32 PM
  • 16264 Views
  • 5 replies
  • 7 kudos
Latest Reply
Carv
New Contributor II
  • 7 kudos

For those stumbling across this; it seems LAST_VALUE emulates the same functionality as it does in SQL Server which does not, in most people's minds, have a proper row/range frame for the window. You can adjust it with the below syntax.I understand l...

  • 7 kudos
4 More Replies
Enzo_Bahrami
by New Contributor III
  • 895 Views
  • 0 replies
  • 0 kudos

Connect File Arrival Trigger to on-prem file server

Hello everyone!I was wondering if there is any way to connect File Arrival Trigger to an on-prem file server. Can I use JDBC or ODBC? will those connect to an on-prem file server (not a SQL server)Thank you

Data Engineering
File Arrival Trigger
  • 895 Views
  • 0 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels